konnect2prot Tutorial

1. Homepage

Figure 1 displays the homepage of konnect2Prot 2.0, accessible at https://konnect2prot_v2.thsti.in/. Clicking the "Start Here" icon directs users to the main application page, as illustrated in Figure 1(A). Selecting the "Contact Us" option, shown in Figure 1(B), opens a contact form where users can submit their queries related to konnect2prot 2.0.

2. Dashboard

Figure 2 illustrates the konnect2prot 2.0 dashboard, which serves as the initial window upon accessing the homepage. The sidebar includes a user-friendly instruction set for guidance, as shown in Figure 2(A). At the top navigation bar, four main tabs—Data, EDA (Exploratory Data Analysis), Visualization, and Network Analysis—are available, as depicted in Figure 2(B). Data uploads are handled through the Data tab. An overview of the portal's workflow is presented in a flowchart, shown in Figure 2(C). Users with gene expression data should proceed via the Data tab, those with DEG files in the required format can continue with the Visualization tab, and users with only gene names or UniProt IDs should navigate to the Network tab.

3. Data Tab

1. In the data tab, there will be an upload sidebar where you can upload your file either by dragging it or by clicking the Drag and Drop files here tab, as shown in Figure 3.
2. The file will automatically open and show the data preview at the right side of upload tab.
3. A preliminary analysis of the uploaded data will be conducted.
4. The results will be shown in graphical forms, including a Box Plot, Mean-Variance Trend, Density plot, and QQ plot, as shown in Figures 4 and 5.

4. EDA

1. As shown in fig 6 there is a pre-processing Parameters side tab where you can select Normalization method, Scaling Method, Alpha Value (Significance Threshold). 2. The normalization method includes log2, log10, and None.
3. The scaling method includes Min-Max, Standard Scaling or None.
4. Alpha value as per your experiment needs.

5. Group Naming includes an input box for defining the names of groups in a binary fashion. You have to select only for Set 1, and Set 2 will be automatically selected. For example, the data can be grouped into CASE and CONTROL, as shown in Figure 7.
6. After clicking on Submit the DEGs Table will be generated alongside with Summary of Sample and DEG as shown in Figure 8.

7.User can also explore 'Pathway Explorarion' tab to see desired pathway containing DEGs, 'Complex Exploration' tab to see desired complex containing DEGs, 'Similarity Exploration' tab to explore the expression similarity of selected gene in the selected group, 'Expression Distribution'(Upto 5 cross validation) tab to see the expression profile of the selected gene(s) in both groups by box plot representation. As shown in Figure 9.

5. Visualization

1. In this section, there will be a Volcano plot, pathway enrichment, PCA of complete Gene Expressions, and Significant Gene Expression, as shown in Figures 10 (A) and 10 (B).
2. In the volcano plot, the slider is used to change the Log2 fold change value.

4. Users can upload a DEG file in the specified format (as shown in Figure 11) to visualize the volcano plot and perform pathway enrichment analysis.
5. However, for generating PCA plots, a complete gene expression file is required.
6.Once the analysis is complete, users can proceed to the Network tab for further exploration.

6. Network Analysis

Here, we have searched k2p using an example gene, "CDK1". The protein-protein interaction (PPI) network of CDK1 and its first neighbours will be constructed at the right-hand side panel. This network can further be filtered using "localization", "molecular functions", "biological processes", "tissue-specificity" or "pathways". An example is shown in Figure 12. For a smooth visualisation, k2p provides different layout options, which can be found in the layout tab, as shown in Figure 13. Click the analysis button to find the enriched pathways and ontologies, multi-disease interactome, and topological analysis.

You may find out how many PDB structures are available for a protein by clicking on it in the created PPI network. The ligand panel provides information on small compounds and their mode of action for the query protein. The "mutation in disease" panel contains information on disease-specific mutations of this protein (if any). The information about the mode of interaction can be accessed by clicking an edge in the network. An example is shown in Figure 13.

a. Enrichment panel

By clicking the "analysis" button, the enrichment panel will display the enriched pathways and processes for the proteins in the constructed PPI network. Additionally, k2p also provides the protein class abundance and the multi-disease landscape of the proteins in the PPI network. This information is shown in Figure 14.

b. Topological panel

The topological panel (see Figure 15) illustrates the results of three critical measures of centrality: degree, betweenness, and closeness centrality. A plot of degree versus betweenness plot is also included to identify the proteins that act as hubs and bottlenecks in the constructed PPI network. For a detailed understanding of the different centrality measures and their application please refer to [1].

c. Spreaders

We have identified the influential spreaders in the network and augmented it with other auxiliary information. Identifying a set of influential spreaders in complex networks plays a crucial role in effective information spreading, which is identified using the voterank algorithm [2].This algorithm determines influential nodes in a network based on an iterative voting mechanism. Each node votes for its immediate neighbors, and the node receiving the highest number of votes is selected as a spreader. After selection, the voting power of the neighbors of that node is reduced to prevent redundancy. This process continues until the desired number of spreaders is identified. For details, please see the [3]. In this approach, all nodes vote in a spreader in each turn, and the voting ability of neighbours of the elected spreader will be decreased in subsequent turns. The identified triggers could be explored during the investigation for various applications, such as potential drug targets. As illustrated in Figure 16, k2p identifies the triggers in the PPI network and files their topological properties, cellular localisation, class, available PDB complexes and ligands. Afterwards, a clustergram of pathways related to the spreaders is shown to give an idea of which pathways are modified by the network's top spreaders. This cluster gram and the high tissue specificity of these influential spreaders can be exported in .png format. The spreaders can be targets or triggers, depending on the context of the study.

d. Spreader-Hallmark associations

Protein-Hallmark associations are another crucial property of k2p. Every disease is driven by specific characteristics or hallmarks. In Konnect2Prot v2, hallmarks refer to key pathological traits associated with diseases, derived from resources such as CancerGeneNet. For example, in cancer studies, hallmarks represent essential processes such as sustaining proliferative signalling, evading growth suppressors, etc., as described by [4]. Identifying proteins associated with the hallmarks helps identify new therapeutic targets with more specific pharmacological activity. Various drugs are deliberately developed for specific molecular targets that involve these hallmarks [5]. Addressing this, k2p incorporates two crucial aspects of drug discovery: protein-hallmark associations and protein-signalling pathway associations, Figure 18. The latter will enable the identification of not just intra-pathway deregulation but also the interdependence of pathways. Again, this information can be utilized to deduce the pleiotropic effects of a large number of genes on distinct pathways that contribute to the development of specific disease characteristics or traits. A directed bipartite graph illustrating hallmark signalling is presented in Figure 18(A), where black dots represent spreader genes and black dots indicate cancer hallmarks. Another directed bipartite graph maps spreader genes, shown as black dots, and their corresponding targets, marked by red dots, to the associated signalling pathways as illustrated in Figure 18(B).

References

1. Minoo Ashtiani, Ali Salehzadeh-Yazdi, Zahra Razaghi-Moghadam, Holger Hennig, Olaf Wolkenhauer, Mehdi Mirzaie, and Mohieddin Jafari. A systematic survey of centrality measures for protein-protein interaction networks. BMC Systems Biology, 12(1):1_17, 2018
2. Jian-Xiong Zhang, Duan-Bing Chen, Qiang Dong, and Zhi-Dan Zhao. Identifying a set of influential spreaders in complex networks. Scientific Reports, 6:27823, 2016
3. Kumar S, Sarmah DT, Asthana S, Chatterjee S. konnect2prot: a web application to explore the protein properties in a functional protein–protein interaction network. Bioinformatics. 2023;39. doi:10.1093/bioinformatics/btac815
4. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144: 646–674
5. Natalia Bailón-Moscoso, Juan Carlos Romero-Benavides, and Patricia Ostrosky-Wegman. Development of anticancer drugs based on the hallmarks of tumor cells. Tumor Biology, 35(5):3981_3995, 2014.6